SUPERCRITICAL PERCOLATION 
ON LARGE SCALE-FREE RANDOM TREES 



JEAN BERTOIN AND GERONIMO URIBE BRAVO 

Abstract. We consider Bernoulli bond percolation on a large scale-free tree in the su- 
percritical regime, meaning informally that there exists a giant cluster with high proba- 
bility. We obtain a weak limit theorem for the sizes of the next largest clusters, extending 
" a recent result in Bertoin [2012] for large random recursive trees. The approach relies on 

the analysis of the asymptotic behavior of branching processes subject to rare neutral 
■ mutations, which may be of independent interest. 

O 

Q. 

1. Introduction and statement of the main result 

Generalizing the procedure to grow graphs and trees of Barabasi and Albert [1999] (see 
also Szymanski [1987] for an earlier version), Dorogovtsev et al. [2000] and Mori [2002] 
grow a so-called scale- free random tree on a set of ordered vertices, say {0, . . . , n}, using an 
algorithm with preferential attachment that we now recall. Fix a parameter £ (—1, oo), 
and start for n = 1 from the unique tree T\ on {0, 1} which has a single edge connecting 

and 1. Then suppose that T n has been constructed for some n > 1, and for every 

1 £ {0, . . . ,n}, denote by d n {i) the degree of the vertex % in T n . Conditionally given T n , 
C<") ■ the tree T n+ \ is derived from T n by incorporating the new vertex n + 1 and creating an 
^ ■ edge between n + 1 and a vertex v n £ {0, . . . , n} chosen at random according to the law 
(N ' 
<N 

(N 



nv n = i\T n )= ^l +f * z£{0,...,n}. 
2n + p{n + 1) 



That the preceding indeed defines a probability on {0, . . . , n} for /3 < oo is seen from the 
fact that, since T n is a tree with n + 1 vertices, there is the identity ^2i =0 d n (i) = 2n. 

There has been a significant interest in the last decade in Bernoulli bond-percolation on 
large scale-free graphs; see in particular Bollobas and Riordan [2003] and Riordan [2005]. 
In the simpler case of trees, this means that having constructed T n for some fi>l, and 
for a given parameter p(n) £ (0, 1), we keep each edge with probability p(n) and remove 
it with probability 1 — p(n), independently of the other edges. This disconnects T n into 
a family of clusters, and the purpose of this work is to study the asymptotic behavior in 
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distribution of the sizes of the largest clusters as n — > oo, for a particular regime of the 
sequence p(n). Specifically, let us write 

c$>c#>... 

for the ordered sequence of the sizes of the clusters. 

In the boundary case /3 — > oo where v n becomes uniformly distributed on {0, . . . , n}, the 
algorithm yields a so-called uniform recursive tree (see for instance Smythe and Mahmoud 
[1994], Drmota [2009]). It has then been observed recently by Bertoin [2012], that choosing 
the percolation parameter so that 

(1) i- p (n)~^ 

Inn 

where c > is fixed, corresponds precisely to the supercritical regime, in the sense that 
both the largest percolation cluster on a random recursive tree of size n ^> 1 and its 
complement, have a size of order n with high probability. Specifically, the largest cluster 
has a size close to e _c n whereas the next largest clusters have size of order nj In n only and 
are approximately distributed according to some Poisson random measure with intensity 
ce~ c x~ 2 dx. 

The main purpose of this work is to show that a similar result holds more generally for 
large scale-free random trees. 

Theorem 1. Set a = (1 + (3)/ (2 + (3), and assume that the percolation parameter p{n) 
fulfills (1). Then 

lim n^C^l = e~ ac in probability, 



n— >oo 



and for every fixed integer j , 



lnn„(p) Inn (p) 



n n J ' 



converges in distribution towards 

(xi,..., Xj ) 

where xi > x 2 > . . . denotes the sequence of the atoms of a Poisson random measure on 
(0, oo) with intensity 

ce~ ac x~ 2 dx . 

Equivalently, l/xi,l/x 2 — 1/xi, . . . , 1/xj — l/xj_i are i.i.d. exponential variables with 
parameter ce~ ac . 

It is remarkable that the intensity measure in the statement only depends on the 
parameter (3 through the constant factor e~ ac . It should also be noted that the map 
(3 i — y a{(3) = (1 + (3)/ (2 + (3) increases, and we then see from Theorem 1 that for the 
same value of the percolation parameter p(n) and n 3> 1, this intensity decreases with 
the parameter (3. This can be explained informally by the fact that when the parameter 
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(3 is larger, the algorithm with preferential attachment produces random trees which are 
less tufty and thus more affected by percolation. 

The approach used in Bertoin [2012] for recursive trees relies crucially on special proper- 
ties of the latter, and more specifically on a remarkable coupling due to Iksanov and Mohle 
[2007] connecting the Meir and Moon algorithm for the isolation of the root with a certain 
random walk in the domain of attraction of the completely asymmetric Cauchy process. 
This fails for scale-free trees, and we thus have to use here a fairly different route. 

It is well-known that growing random scale-free trees bears close relations to Yule pro- 
cesses. We shall incorporate an independent Bernoulli percolation to the algorithm with 
preferential attachment and interpret this in terms of neutral mutations which are super- 
posed to the structure of the branching process. This leads us to investigate in Section 2 
the asymptotic behavior of a system of branching processes with rare neutral mutations 
up to a large random time, in certain regimes when the small mutation parameter is 
related to the size of the total population. We then specify in Section 3 those results to 
Yule processes, make the link with percolation on scale-free trees and prove Theorem 1. 

2. Branching processes with rare neutral mutations 

Thus main purpose of this section is to establish some general results about the long 
time behavior of a system of branching processes with rare neutral mutations in a certain 
specific regime. The system is presented in the first sub-section, and then asymptotic 
results are established in the second. 

2.1. Description of the system of branching processes with mutations. We start 
by considering a pure birth branching process Z = (Z(t) : t > 0) in continuous space, 
with unit birth rate per unit population size and reproduction law u, where v denotes a 
probability measure on (0, oo). This means that Z is a non- decreasing Markovian jump 
process such that when Z(0) = z > 0, its first jump occurs after an exponential time with 
parameter z, and the jump size has law v. 

We assume that the second moment of v is finite, which is more than sufficient to 
ensure that Z never explodes a.s. Recall that (3 > — 1 is some fixed parameter. We 
further suppose that ^((0, 1 + /?]) = (the role of this assumption shall be plain latter 
on), so that when a birth even occurs, the population always increases by an amount at 
least 1 + /3. We shall be mainly interested in a class of population systems which arise 
by incorporating neutral mutations to the preceding branching process. It may be useful 
to think of Kimura's infinite site model, in which a genetic type consists of an infinite 
sequence of letters and each mutation affects a different locus. In particular, one can 
reconstruct the genealogy of the types by comparison with the ancestral type; see e.g. 
Section 2 in Bertoin [2010] for a closely related setting. 
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More precisely, let U = Un>o^ n denote the Ulam tree, with the convention that 
N° = {0}. That is, each element u G U is a finite sequence u = (ui, . . . , u n ) of positive 
integers, whose length \u\ — n corresponds to the height of u in U, and the empty sequence 
serves as the root of U. Each vertex tiGU corresponds to a genetic type; in particular 
we view as the ancestral type, and for every u = (ui, . . . ,u n ) G U and j G N, the 
j-th child of u, uj = {u\, . . . , u n ,j), represents the new genetic type which appears at the 
instant when the j'-th mutation occurs in the sub-population with type u. 

The state of the population system at a given time is given by a collection of nonnegative 
real numbers (z u : u G U), where z u is the current size of the sub-population with type 
u. The evolution of the system is thus described by a process Z = (Z(t),t > 0), where 
for each t > 0, Z(t) = (Z u (t) : u G U) is a collection of nonnegative variables indexed by 
Ulam's tree. At the initial time, all the Z u (0) are taken to be equal to zero, except Z (O) 
which is the size of the ancestral population. 

We then describe the random evolution of the system Z, which depends on a parameter 
p G [0,1]. Recall that the reproduction law v assigns no mass to (0,1 + 0], so we 
may consider a positive random variable £ such that £ + 1 + /3 has the law v. We 
imagine that mutations O C CUX dbt Tdbt G 1 — p per unit population size, always produce a 
single mutant population of fixed size 1 + /3, and are neutral, in the sense that they 
do not affect the reproduction law. In particular, the different populations present in 
the systems (i.e. with strictly positive sizes) evolve independently one of the other and 
according to the same random dynamics. For each sub-population, say with size z > 0, 
we introduce an independent copy £' of £, a Bernoulli variable e p with parameter p, 
and an exponentially distributed variable ( z with parameter z. We assume that these 
three variables are independent, and also independent of the other variables associated 
to the other sub-populations. The time ( z corresponds to the first birth event in that 
sub-population. The total size of the children born at this birth event is £' + 1 + (3; the 
variable e p specifies whether a mutation occurs. Specifically, the size of clone children is 
£' + € p (l + (3) and the size of mutant children is (1 — e p )(l + (3). So mutation occurs if 
and only if e p = 0, an event which has probability 1 — p. 

In order to underline the role of the rate of mutation, we henceforth write 

Z (P) = (Zi p \t) :t>0,«eU) 

instead of Z. It should be obvious however that, no matter what p is, the process of the 
total size of the population 

z(t) = J2zi p \t), t>0, 

is distributed as the branching process described at the beginning of this section. 
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Clearly, the process of the size of the sub-population with the ancestral type [Z^ (t) : 
t > 0) is a continuous time branching process in continuous space with reproduction law 
given by the distribution of e p (l + /?)+£. More generally, if for u G U, we write 

6«=inf{t>0:Z<W (*)>«}> 
for the birth time of the sub-population with type u, then each process 

(ZP(t + bV>),t>0) 

is a branching processes with the same reproduction law as and starting from 1 + /3 
for u 7^ 0. Further, it should be intuitively clear (although this shall not be needed here) 
that the processes Z^\b^ + •) for u G U are independent. Focussing on types of the first 
generation N 1 , i.e. bearing a single mutation, we point at a useful independence property 
involving the birth times: 

Lemma 1. The processes {Z^'(pf + t) : t > 0) for i > 1 form a sequence of i.i.d. 
branching processes with reproduction distributed according to £ + e p (l + /3), and starting 
point 1 + (3. Further, this sequence is independent of that of the birth-times )i>i and 
of the process Z@ of the sub -population with the ancestral type. 

Remark. It is crucial in this statement to focus on sub-populations bearing the same 
number of mutations; for instance the independence property of the birth-times would 
fail if we considered the whole the family of processes Zu^ + •) for u G U\{0}. 
Proof: Let (X, M) be a continuous-time Markov chain with values in R + x Z + with two 
types of transition: 

(x, rrCj i — y (x + dx, m) at rate xpv{dx) 
(x, tyi) i — y (x + dx, m + 1) at rate x(l — p)v(l + (3 + dx). 

In particular, X is a branching process distributed as Z^ and we can interpret M as the 
process of the number of mutation events which occur within the sub-population with 
the ancestral type. 

Let 71 < 72 < • ■ ■ denote the sequence of jump times of M and set 70 = 0. Indepen- 
dently on (X, M), let (Xi,i G N) be a sequence of i.i.d. branching processes with the 
same law as Z^ but with starting value 1 + j3. We then form the process 

X(t) = (X(t), l t >^Xy{t - 71), h> 12 X 2 {t - 72 ), . . .) , t > . 

The analysis of jump times and positions then readily shows that X is Markovian and 
has the same law as (Z^\ Z^ \ z!f \ . . .). □ 
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2.2. Asymptotics for rare mutations. Recall the assumption that the reproduction 
law v of the branching process Z has a finite second moment and write 



mi 

It is well-known that 



J xv(dx) and m2 = J x 2 v{dx) . 



W(t) := e~ mit Z(t) , t > 

is then a nonnegative square-integrable martingale, and we write W(oo) for its terminal 
value. Furthermore W(oo) > a.s. since Z cannot become extinct (cf. Theorem 2 p. 112 
in Chapter III of Athreya and Ney [2004] for the general assertion and Example 5.4.3 p. 
253 of Durrett [2010] just for the finite variance case). 

It is easily checked that the speed of convergence of the martingale W is exponential. 
Specifically, if we write for the distribution of the branching process Z started from 
z > 0, then the following general bound holds 1 . 



Lemma 2. For every t > 0, there is the upper-bound 



E z ( sup \W(s) - W(oo)\ 2 < lOz—e 

s>t J ™>1 



m 2 - mi t 



As a consequence, we have 



E z [ supe m ^ 3 \W(s) - W(oo)\ 2 ) < ^ zm ^' Z 



iS >o J ~ mi(l -e~"W 6 ) 2 ' 

Proof: By Doob's inequality and basic properties of square integrable martingales, we 
have 

E z (sup \W{s) - W(oo)| 2 ^) < lOE^^oo - [W] t ) 

where 

[W] t = \^W(s)\ 2 = e- 2m ^\Z(s)-Z(s-)\ 2 . 

0<s<t 0<s<t 

A straightforward calculation shows that the compensator of jump process [W] is 



(W} t = m 2 / e^ s Z(s)ds, 
Jo 

that is [W]t — (W)t is a local martingale. Finally observe that 

E z (e- 2miS Z(s)) = e- miS E z {W{s)) = ze~ miS 



so 



E z ((W0oc - (W) t ) = z—e 

mi 



m 2 - mit 



1 The assumption that v assigns no mass to (0, 1 + /3] plays no role here, and Lemma 2 holds when this 
assumption is dropped. 
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This enables us to assert that 

E z ([^]oo - [W] t ] = E,((W) 00 - (W) t ) , 

and our first statement follows. 

Turning our attention to the second inequality, we write for every integer n > 

sup e mis/3 \W{s)-W{oo)\<e mi{n+1)/3 sup | W(s) - W(oo)\ . 

n<s<n+l n<s<n+l 

It follows from the first part that the L 2 -norm of the right hand side can be bounded 
from above by 

e mi(n+1)/3 l|sup|W(s)- W(oo)||| 2 < A /lOz— e" mi(n - 2)/6 , 

s>n V m l 

so taking the sum over n and applying Minkowski's inequality yields the stated bound. 
□ 

The main purpose of this section is to specify the joint asymptotic behaviors of the 
branching processes Z$ and for i e N in appropriate regimes when p — y 1 and time 
tends to oo. In this direction, we denote the mean reproduction of Z^ by 

mi(p) =E(£)+p(l + p), 

and recall that the process 

W%'\t)=e- mi<p *Z$ ) (t) t t>0 

is a martingale with terminal value denoted by (oo). For each fixed t > 0, we have 
limp^x (t) = W(t), and on the other hand, we know that lim^tx, W(t) = W(oo) in 
L 2 . As a matter of fact, we have a stronger uniform convergence. 

Lemma 3. It holds that 

lim E z (sup \Wi p) (s) -W(oo)\ 2 ) =0. 

p-H,t->oo \s>t J 

Proof: Note that for 1/2 < p < 1, we have m\(jp) > and the second moment of 
£ + e p (l + 0) is at most m 2 . We deduce from Lemma 2 applied to the branching process 
Z^ 1 that for every fixed e > 0, we can find t e < oo such that 

(2) E z (sup \W^ p \s) - W { i\oo)\ 2 j < e for all p e [1/2, 1] . 

We next claim that 

(3) \M z (\W { i\t e )-W(t £ )\ 2 )=U. 

p— >i 

Indeed, recall that 6^ denotes the first birth time of a mutant population. Plainly, 
linip^! of = oo in probability, and the probability of the event {t £ > } can be made as 



PERCOLATION ON SCALE-FREE RANDOM TREES 8 

small as we wish by choosing p sufficiently close to 1. On the one hand, asZ { £\t £ ) < Z(t £ ), 
we have 

M\W^ p \t £ ) - W{t £ )\\t £ > b?) < (e 2 < mi - mi ^ + l)E z (\W(t £ )\ 2 ,t £ > b?) , 

and the right-hand side goes to as p — Y 1. On the other hand, on the event {t £ < &i }, 
we have Z {p \t £ ) = Z(t £ ) and hence W%\t E ) = e ^ mi - mi ^W(t £ ). This yields 

E z (\wi p \t £ ) - W(t £ )\ 2 ,t £ < b?) < (e^~ m ^ - l) 2 K z (\W(t £ )\ 2 ) , 

and again the right-hand side goes to as p — > 1. This establishes (3). 
Combining (2) and (3), we get 

limsupE 2 (|W(oo) - W^ } (oo)| 2 ) < As, 
and since e > can be chosen arbitrarily small, we have in fact 



uvj ; — v v 

Plugging this in (2), we conclude that 



(4) limE 2 (|W(oo)-^(oo)| 2 ) = 0. 



limsupE, sup \Wr(s) - W(oo)\ 2 ) < e 

p^fl \s>t e 



which is equivalent to our statement. □ 
We next turn our attention to the asymptotic behavior of the birth times b± for 
% — 1, 2, ... of the different types with a single mutation. 

Lemma 4. As p — > 1, the sequence 

1 ~P\xr(^\ o VT . (™.(„\h&) S 



'-W(oo) exp \mx{p)b i f'\ , i > 1 



mi 

converges in the sense of finite- dimensional distributions towards 

Si ■= e± H h ej , % > 1 , 

where (e^i^ denotes a sequence of i.i.d. standard exponential variables. 

We stress that, thanks to Lemma 1, the sequence above is independent of the processes 
[zf^ipf 1 + 1) : t > 0) for i > 1. This observation will be important later on. 
Proof: Define ^ 

!«(*):= / Z { i\s)ds, t>0. 

Jo 

The random map : [0, oo) — > [0, oo) is a.s. bijective, and we denote its inverse by J( p ). 
It follows immediately from the description of the population system that if we time- 
change the process t i— > (t) which counts the number of types with a single mutation, 
by then we get another counting process t ^ MW o J(p)(t) with unit jump rate. In 
other words, M W o jto is a standard Poisson process, and therefore the sequence of its 
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jump-times is given by a random walk S with exponentially distributed steps with unit 
mean 2 . Since the birth-times b\ p ^ for % > 1 are the jump-times of M^ p \ this yields 

lto(b®) = Si, i>l. 

We now only need to estimate I^(t) as both p — )■ 1 and £ — > oo. In this direction 
observe from the triangle inequality that 

/0(f) _ lllE.(e m iW* - 1)^(00)' 



mi(p) 



< (1-p) / |4 P) (s) - ^ p) (oo)e mi(p)s |ds 
= (1-p) f\w£\s) - W^ p) (oo)|e mi(p)s ds 



< (1 - p)A (p) e mi{p)2t/3 , 

where 

:= — supe m ^ p)s/3 \wi p) (s) - W^ p) (oo)\ . 
2m 1 (p) s > 

Recall from Lemma 2 that the variables are bounded in L 2 (P) for 1/2 < p < 1; we 
deduce that 

2 N 



lim E I sup 



i— s-oo 



e ~ miiP)S &\s) W ® ){00) 



1 — p rni(p) 



uniformly in l/2<p< 1. 



Recall also from (4) that Hm ? ,^ 1 _ Wq (00) = W(oo) in L 2 (P), where V7(oo) is strictly 
positive a.s., and note that — > 00 in probability for every i > 1. It follows that 

5, = J« ~ — ^ e mi(p)fe ' P V(oo) in probability, 
mi(p) 

and clearly we may replace m\{p) by m\ in the fraction above. □ 
We have all the technical ingredients to establish the main result of this section, but 
we still need some additional notation. For each p e (0, 1), consider a random time 
such that 

(5) lim (mi(p)T^ + ln(l — p)) =00 in probability. 

Let W'{po) be a variable distributed as the terminal value of the martingale W(t) = 
e~ mit Z(t) where the starting point is now Z(0) = 1 + 0. We introduce (W{(oo) : % > 1) 
a sequence of i.i.d. copies of W'(oo). We finally recall that (Sk : k > 0) denotes a 
random walk with i.i.d. steps distributed according to the standard exponential law. We 
implicitly assume that (W/(oo) : % > 1) and (5* : k > 0) are independent. 



2 This random walk depends on the parameter p, however since only its law is relevant in this proof, 
this will be omitted from the notation for simplicity. 
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Theorem 2. As p — > I, the sequence 



■Z<?\t^) :i>\ 



(l-p)W(oo) 

converges in the sense of finite- dimensional distributions towards 

'W'(oo) 



m 1 S i 



i > 1 



Proof: Recall that b'f denotes the instant of the i-ih mutation in the sub-population 
with the ancestral type, and set for i > 1 and t > 0, 



W^(t) 



-mi(p)t y(p) 



Fix z > 0. By Lemma 3, for every continuous / : R — > R bounded in absolute value by 
1 and every e > 0, there exist t(f, e) and p(f, e) such that if p(f, e) < p < 1 then 



E- 



and 



sup 

ti,t 2 >t(f,e) 



E z [f(W^(t(f,e))))-E z (f(W(oc))) 



f{W$\h))-f{w£\t 2 )) 



< e 



< 8 



Without loss of generality, we may also assume that the same inequalities hold with W0 
and W(oo) replaced by and W/(oo) for any i G N, since this amounts to taking 

z = l + p. 

Consider then for each i > 1, a family of random times (t^ )o<p<i 5 such that lim p _^i = 
00 in probability Since we can guarantee that F z (t!f^ < t(f,s)j < e for p > p(i,f,e), 
we see that 

E, (f(W^(t?))) -E(/(W/(oo)))|<2e 

if p > p(i,f,e). The independence of the is seen from Lemma 1, and we have 

deduced the weak convergence (in the sense of finite dimensional distributions) 



Recall from Lemma 4 that we have also 



(Wftoo) : i > 1) . 



p)W(oo) 



exp ( — m\{p)b\ 



(v) 



: i > 1 



mi 



: i > 1 



More precisely, we deduce from Lemma 1 that these two weak convergences hold jointly, 
provided that we take the sequences (W/(oo) : i > 1) and (Si : i > 1) to be independent. 

To complete the proof, it now suffices to set = — of' for i > 1 and take the 
product of the preceding weak limits. Note that Lemma 4 and the assumption (5) ensure 
that indeed lim^i^ =00. □ 
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3. Combining preferential attachment with percolation 



We are now able to start investigating the question which has motivated this article. 
It is convenient for our purpose to work with a continuous version of the preferential 
attachment algorithm, in the sense that we shall grow a scale-free tree in continuous time. 
That is, we start at time from the tree on {0, 1}, and once the random tree with size 
n > 2 has been constructed, we equip each vertex i e {0, . . . , n} with an exponential clock 
Q with parameter d n (i) + f3, where d n {i) denotes the current degree of i, independently 
of the other vertices. Then the next vertex n + 1 is attached after time minj g { .... jn } Q at 
the vertex v n = argminj g { ,...,n} Ct- Recall that the sum of the degrees of a tree with n + 1 
vertices is 2n, so minj G {o,...,n} is exponentially distributed with parameter 2n + /3(n + 1). 

Denote by T(t) the tree which has been constructed at time t, and by \T(t)\ its size, 
i.e. its number of vertices. It should be plain that if we define 



then T{r n ) is a version of a scale-free tree of size n + 1, T n . The process of the size \T(t)\ 
of T(t) is clearly Markovian; however it will more convenient in practice to work with a 
linear transformation of it, namely 



In particular Y(0) =2 + 2(3. 

Lemma 5. The process Y is a pure birth branching process, that has only jumps of size 
2 + (3, and with unit birth rate per unit population size. Equivalently, (2 + is a 

Yule branching process in continuous space with birth rate 2 + (3 per unit population size. 

Proof: The sum of degrees of vertices in T(t) is 2(|T(£)| — 1). Because when this tree 
has size n, the next vertex n + 1 is incorporated at rate 2(n — 1) + /3n, which yields an 
increase of Y by 2 + (3, we see that Y is a branching process in continuous space and time 
with unit rate of birth per unit population size, and reproduction law given by the Dirac 
point mass at 2 + f3. Normalizing Y by a factor (2 + we recognize the dynamics of 
a Yule process. □ 
We next superpose Bernoulli bond percolation to this construction by marking each 
edge e,j connecting a vertex j > 1 to its parent v j with an independent uniform random 
variable Uj. The parameter p e (0, 1) being fixed, we may imagine that Cj is cut at its 
midpoint when the mark Uj > p and remains intact otherwise. We write T^'(t) for the 
resulting combinatorial structure at time t. That is T^ p \t) has the same set of vertices 
as T(t), its set of intact edges is the subset of the edges ej of T(t) such that Uj < p, and 
further T^(t) may have half-edges which should be viewed as stubs attached to some 
vertices and correspond to edges of T(t) which have been cut in two. The point in cutting 



r n = inf{t>0: \T(t)\ 



n + 1}, 



Y(t) = 2(\T(t)\-l)+p\T®\ 



t > 0. 
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rather than removing edges is that the former procedure preserves the degrees of vertices, 
where the degree of a vertex is defined as the sum of the intact edges and half-edges 
attached to it. 

The percolation clusters of T^ p \t) are the subtrees of T(t) formed by the subsets of 
vertices which can be connected by a path of intact edges. We write Tq (t) , (t) , . . . for 
the sequence of subtrees at time t, where the enumeration follows the increasing order of 
their birth times, and with the convention that Tj (t) = when the number of edges that 
have been cut at time t is less than j. Specifically, if j is the label of the i-th variable Uj to 
be greater than p, then (t) is the combinatorial structure spanned by the vertices that 
can be joined by a path of intact edges to the vertex j. In particular T{f\t) denotes the 
subtree at time t which contains the vertex 0; it shall play a special role in our analysis. 

We write H^ \t) for the number of half-edges pertaining to the i-th. subtree at time t, 
so that 2(\T^ p \t)\ — 1) + H[ p \t) is the sum of the degrees of vertices of the i-th subtree. 
We stress that 

Y,\Ti P \t)\ = \T{t)\ and £(2(|7f >(t)| - 1) + (t)) = 2(\T(t)\ - 1) . 

i>0 i>0 

If we set 

y(p) {t) = 2(17^)1 - 1) + H^it) + f3\T?\t)\ , t > , 
then we see from above that 

(6) J>/ p) (t) = F(t). 

i>0 

The connexion with the system of branching processes with neutral mutations of the 
preceding section should be clear. Specifically, imagine that at some given time t, the state 
of the the process Y^ p ^ = (Y^ : j > 0) is given by (y , yi, . . .), and write y — yo + yi + ■ ■ ■ . 
In particular the current size of the growing tree is \T(t) \ = (y + 2)/(2 + 0) and we know 
from Lemma 5 that the next vertex will be incorporated after an exponential time with 
parameter y. The probability that the edge corresponding to this new vertex has its other 
extremity in the z-th subtree T^(t) is 

2(\T^\t)\-l) + Hl P \t)+^ p \t)\ = k 

y y ' 

independently of the waiting time. Finally, the probability that this edge is intact is p, 
independently of the preceding variables. We thus see from basic properties of indepen- 
dent exponential variables that has the same random evolution as the system 
of branching processes with neutral mutations of Section 2 when the reproduction law v 
is simply given by the Dirac mass at 2 + /3. In this setting, Y^ 1 corresponds to Z& \ the 
sub-population with the ancestral type of Section 2. There is however an important dif- 
ference between the way clusters and sub-populations are labeled that should be stressed 
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to avoid a possible confusion. More precisely, the families = {Zu : u £ U) and 
y{p) = (Yj : j > 0) do represent the same process, and in particular {Zf : i £ N) is 
only a sub-sequence of (i^ : j £ N) which corresponds to subtrees at distance 1 from 
the root-subtree T^f \ Recall that focussing on sub-populations with a single mutation is 
crucial to ensure the validity of Lemma 1. 

Define the generation of a vertex as the number of edges e on the branch from this 
vertex to the root which have a mark U e > p (in other words, this is the number of 
cuts on that branch). In particular vertices of have generation 0, and those of 
have generation 1. We then set p(i) — j for i > 1 when the j-th subtree of T^ p \t) is the 
i-th subtree of the first generation, where, as usual, subtrees in a family are enumerated 
according to the increasing order of their birth times. In particular, we always have 
p(l) = 1 and the sequence {p{i) : i > 1) is strictly increasing. The following claim should 
be plain from the discussion above. 

Corollary 1. In the notation of Section 2, take £ = 1 and z = 2 + 2(3. Then the families 

(Y, Y P \ Yjf), ^p(2)> • • •) an d {Z, Z , z[ p \ z!f\ . . .) 

have the same distribution. 

In the sequel, it will be convenient to agree that the two families in the statement above 
are actually the same (not merely are identical in law). Recall also that the algorithm 
with preferential attachment is run until time 

r n = inf{t > : \T(t)\ = n + 1} = inf{t > : Y(t) = 2n + (3(n + 1)} 

when the structure has size n + 1. 

We henceforth assume that the percolation parameter p = p(n) fulfills (1). 
The motivation for this choice stems from the next statement, which shows that both the 
root-cluster and its complement are then macroscopic (i.e. of size of order n). For the 
sake of simplicity, we shall frequently write p rather than p(n), omitting the integer n 
from the notation. Recall that a = (1 + j3)/(2 + (3). 



Corollary 2. We have 



lim r ° [Tn) = (2 + f3)e 

n— too n 



in probability. 

Proof: We know from Corollary 1 and Lemmas 2 and 3 that 

lim e- miTn Y(r n ) = lim e- mi{p)Tn Y {p) (r n ) = W(oo) in probability, 
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with mi = 2 + (5 and mx(p) = 1 + p(l + /3). By the definition of r n , we have Y(r n ) = 
2n + /3(n + 1), hence 

(2 + P)n 

and, a fortiori, r n ~ (2 + Inn. Our claim follows since 

rm - m!(p) = (1 - p(n))(l + /?) ~ , 

Inn 

thank to (1). □ 
Next, let 

N {p \t) = max{j : rj p) (t) ^ 0} 

denote the number of subtrees at time t, discounting the root-subtree T (£) containing 
0. Recall also that M^it) denotes the number of sub-populations with a single mutation 
at time t, that is of subtrees at time t which are at unit distance from Tq (t). We shall 
now observe that when p is close to 1, these two quantities coincide with high probability 
as long as they are not too large. In this direction, recall that p = p(n) fulfills (1), and 
observe from Lemma 4 that for each fixed i > 1, the time bf^ of the z'-th mutation within 
the sub-population with the ancestral type (i.e. the first instant when M^> reaches i) 
fulfills 

, foi 1 , 1 . . In In n . . 

W ' = — rrln^ -- + 0(1) = - 77 + 01 asn->oo. 

Lemma 6. Set A^ p \t) = N^ p \t) — M^\t) for the number of subtrees at distance strictly 
greater than 1 from the root-cluster at time t. For every r > 0, we have 

lim E (A (p) ((2 + (5)~ l In Inn + r)) = . 

n— >oo 

Proof: Roughly speaking, the dynamics of show that the counting process 
grows at rate (1 —p)Y, which means rigorously that the predictable compensator of N^) 
is absolutely continuous with density (1 —p)Y. In other words, (t) — (l—p) J* Y (s)ds 
is a martingale, and thus 

p(2+f3)- 1 In lnn+r 

E(A^ (p) ((2 + /3)- 1 lnlnn + r)) = (1 - p) / E(Y(s))ds . 

Jo 

Similarly, the counting process grows at rate (1 — p)Y , , and 

A2+0)- 1 In lnn+r 

E(M (p) ((2 + /3)^ 1 In lnn + r)) = (1 - p) / E(F (p) (s))ds . 

Jo 

We deduce from Lemma 5 that 

E(Y(s)) = 2(1 + /3)e (2+/3)s and E(F (p) (s)) = 2(1 + p) e (*+P-V-pW+P))> ; 
and our claim then follows from (1). □ 
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Lemma 6 entails in particular that for each fixed k > 1, the probability that the k- 
tuple of processes O'i )i<i<fc and (Z^)\<i<k coincide tends to 1 as n — > oo. This enables 
us to deduce the asymptotic behavior of the former from Theorem 2. We shall use the 
same notation as there, specialized to the setting of this present section. That is, W'(oo) 
denotes the terminal value of the martingale er^^Y it) given Y(0) = 1 + (W/(oo))j>i 
is a sequence of i.i.d. copies of W'(oo), and (Si)j>i an independent random walk whose 
steps have the standard exponential distribution. 

Corollary 3. The sequence 

Inn i n \ , , 

converges in the sense of finite- dimensional distributions as n — > oo towards 

Proof: Recall that (mi — mi(p))r n — > ac, as proved in Corollary 2. Hence 

W(oo) 



exp(-mi(p)r„) = exp((mi - mi(p))r n ) exp(-mir n ) ~ e c 



(2 + /3)n' 

and our claim follows from Theorem 2 specified in the present setting with 

T {p) = T ( P (n)) = 

T n . D 

Next, we easily translate the above limit theorem for the branching processes in 
terms of the sizes of the subtrees listed in the increasing order of their ages. 

Corollary 4. We have 



it 

n— >oo 



lim n~ 1 \T^(r n )\=e 



and the sequence 



Inn, 



T«(r»)h«'>l 



n 

converges as n — > oo , m the sense of finite- dimensional distributions, towards 

(2 + W - 1 

Proof: We focus on the second claim, the proof of the first being similar (and easier) 
using Corollary 2 in place of Corollary 3. 
From Corollary 3, it suffices to show that 

F/ p) (r n )~(2 + /3)|T l (p) (7:„)|, 

and for this, that the number of half-edges pertaining to the i-th sub-tree fulfills 

(7) H^\r n ) = o{Y^{r n )) . 
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In this direction, recall that the z-th jump time 7^ := inf{t > : N^(t) = i} of the 
process that counts the number of subtrees as time passes, is a stopping time which 
corresponds to the birth-time of the z-th subtree . We observe from the dynamics 
described at the beginning of this section and the strong Markov property that the process 

+t)-(l-p) f F/ P) ( 7 ? } + s)ds , t > 
Jo 

is a martingale. Similarly, 

Yi P \l? + t) ~ (1 - P + p(2 + P)) [ r/ p) (7? } + s)ds , t > 

Jo 

is also a martingale. It follows that 

i>)(t) : = ffW( 7 « +t)- -J—P—YMfrb) + t) 

is a martingale; note also that its jumps \L^ p \t) — L^(t—)\ have size at most 2 + (3, 
independently of p. Since there are at most n jumps up to time r n — r ) ( f \ the bracket of 
L (p) 

can be bounded by 

[£ (P) ] T _ 7 ( P )<(2 + /3)V 

Hence 

E(|L^(r n -7? ) )-^ (p) (0)| 2 ) < (2 + /3) 2 n, 

and in particular 



lim E 

n— >oo 







n 

The estimate (7) now follows readily from Corollary 3 and the fact that 1 — p(n) = o(l). 
□ 

Our final task is to deduce from Corollary 3 a limit theorem for the sizes of the percola- 
tion clusters listed in the decreasing order of their sizes, rather than their ages. Roughly 
speaking, we shall check that the largest clusters are given by the older subtrees, in the 
sense that for every fixed k, with high probability when n — > oo and i — > oo, the k 
largest percolation clusters of T^ p \r n ) are to be found amongst the i oldest subtrees 

Recall from Lemma 4 that for each fixed % > 1, the z-th oldest subtree T i (t„) was 
born at time (2 + In Inn + 0(1), and from Corollary 3 that its size is of order nj Inn. 
We thus have to check that it is unlikely to have at time r n a subtree of size ~ nj In n or 
greater, and which was born at a much later time than (2 + In In n. Here is a formal 
statement, which is expressed for conveniency in terms of the processes Yjf 1 . 

Lemma 7. For every e > 0, we have 

lim limsupP (sk > 1 : Yjf\(2 + In Inn + r) = andYjf\r n ) > en/\nn) =0. 
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Proof: Let (J p t)t>o denote the natural filtration generated by the (continuous time ver- 
sion of) the algorithm with preferential attachment, including the uniform marks on the 
edges. The counting process is (J-i)-adapted, and its jump times 7^ := inf{t > : 
N^(t) = k} are stopping times that correspond to the birth-times of subtrees. 

An application of the strong Markov property to the algorithm (recall also Lemma 5) 
shows that for each k > 1, the process (2 + /3) -1 Y fe ^(- + 7^) is a Yule process with 
birth rate 2 + (3 per unit population size, started from (1 + /3)/(2 + (3) = a < 1, and 
independent of J 7 ( P ) . Plainly, the latter can be bounded from above by a Yule process with 



Tfc 



the same birth rate and started at 1, in particular (2 + /3) ^^'(u + j^') is stochastically 
bounded from above by a geometric variable with parameter exp(— (2 + f3)u) (see, e.g. 
Athreya and Ney [2004] on page 109). That is, the tail distribution of (2 + p)~ x Yj*\u + 
7^) admits the bounds t (1 - exp(-(2 + (3)u)) e+1 . 

It is convenient to write r n = (2 + /3) _1 In Inn + r and s n = (2 + /3) _1 lnn + s. Fix s > 
arbitrary large, and consider the number of processes Y {p) which are born after time r„ 
and reach a size greater than en/ Inn at time s n , namely 



>n<7r '<s n } {Y, w (s„)>en/ Inn} 



fc=l 



{Y (P ^ (s n )>en/\n 
L jv(p)(t) v ' ' 



n} dNW{t). 



The preceding observations entail that 



E(X n ) < E 



< E 



exp(-(2 + (3)(s n - t)))^~ l£n / lnn &N^\t) 



exp 



en 



(2 + /3) Inn 



exp(-(2 + /3)(s ft -t)) 



where in the second line, we used the inequality (1 — x) a < exp(— ax). 

Next recall that the counting process has a predictable compensator which is 
absolutely continuous with density (1 — p)Y. This enables us to express the last quantity 
above as 



;i-p)e 



Y(t) exp 



en 



(2 + /3)lnn 

Recall also that E(Y(t)) = 2(1 + /3)e (2+/3) *; we arrive at 



exp(-(2 + /3)(s„-t)) 



dt 



E(X n ) < 2(1 + /5)(1 

2(1 + 

2 + /3 1 



P) 



e( 2+ «*exp 



en 



(2 



,(2+«s 



exp 



e (2+^)r„ 



- /3) Inn 

en 



—x 



(2 + /3)lnn 



exp(-(2 + /3)(s n -t)) 
exp(-(2 + /3)s n ) dx 



dt 



Inn 



< 2(l + /3)(l-p) exp((2 + /3)s n )exp 



en 



_ e (2+/3)r n 



en 



(2 + /3)lnn 



exp(-(2 + /3)s r 
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Plugging into the last expression the values of r n and s n and using (1), we conclude 
that 

limsupE(X„) < c'e- 1 e (2+/3)s exp(-5(r - s)/(2 + /?)) , 

n— >oo 

where d is some constant depending only on a, /3 and c. This quantity goes to as 
r — > oo, for every fixed e and s, and this entails in particular that 

lim limsupP (sk : Y fc (p) (r n ) = and Y" fc (p) (sn) > en/ Inn) = . 
Since (from Lemma 5) 

lim P(r n > s n = (2 + Inn + s) = , 

s— >oo 

this completes the proof. □ 
We are now able to complete this paper and establish Theorem 1. Recall that the 
scale-free tree T n with n + 1 vertices can be obtained as T(r n ), and that the first claim 
of Theorem 1 is also the first claim of Corollary 4. The key issue for the second claim 
is the following. Corollary 3 provides a limit theorem in the sense of finite-dimensional 
distributions for the normalized sequence of the sizes of the subtrees ordered by age, 
whereas Theorem 1 concerns ordering by size. 

Lemma 7 and the fact that |T^(£)| — 1 < (2 + f3)~ 1 Y^ p \t) enable us to assert that if 
we use the notation (xi)^ for the decreasing rearrangement of a sequence of nonnegative 
real numbers (xj) which converges to 0, then 



n 

converges as n — > oo, in the sense of finite-dimensional distributions, towards 

So all what is needed now is to identify explicitly the distribution of the limit above. In 
this direction, we start specifying the law of the i.i.d. variables W((oo). From Lemma 5 
and standard properties of Yule processes (recall also the notation a = (l + /3)/(2 + /3)), we 
get that that if Y' denotes a version of the branching process Y started from Y' (0) = 
then 

lim e - (2+/3) V(t) = W'{oo) a.s. and in L 2 (P) 

t— ¥00 

where W'(oo) is a gamma variable with (shape and rate) parameter (a, 1/(2 + (3)), that 



is 



¥(W'(oo) e dw) = T(a)- 1 (2 + ^w^e^'^dw , w > 
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Recall that (Si)i>o is a random walk with standard exponentially distributed steps, 
which is independent of the (W/(oo))j>i. It follows that (Si, W[)i>\ can be viewed as the 
sequence of the atoms of a Poisson point process on M + x R + with intensity 

ds ® (2 + ^)- a ^-e- w/{2+ ^dw . 
T(a) 

The image of this measure by the map 

111 

(s,w) ^x = ce~ ac (2 + (3)~ l - 

s 

is again a Poisson random measure, now with intensity ce~ ac x~ 2 dx, which establishes the 
second part of Theorem 1. Finally, the alternative description in terms of the inverses 
of the atoms belongs to the folklore of Poisson random measures (simply note that the 
image of ce~ ac x~ 2 dx by the map x h- > 1/x is ce~ ac dx). 
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